AITopics | association test

Collaborating Authors

association test

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Assessing Social and Intersectional Biases in Contextualized Word Representations

Yi Chern Tan, L. Elisa Celis

Neural Information Processing SystemsFeb-11-2026, 16:20:16 GMT

Socialbiasinmachine learning hasdrawnsignificant attention, withworkranging from demonstrations of bias in a multitude of applications, curating definitions of fairness for different contexts, to developing algorithms to mitigate bias. In natural language processing, gender bias has been shown to exist in context-free word embeddings. Recently, contextual word representations have outperformed word embeddings in several downstream NLP tasks.

machine learning, natural language, urlhttp, (19 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.15)
North America > United States > Louisiana (0.05)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
(9 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Identifying multi-omics interactions for lung cancer drug targets discovery using Kernel Machine Regression

Ahmed, Md. Imtyaz, Hossain, Md. Delwar, Rahman, Md Mostafizer, Habib, Md. Ahsan, Rashid, Md. Mamunur, Reza, Md. Selim, Alam, Md Ashad

arXiv.org Artificial IntelligenceOct-21-2025

Cancer exhibits diverse and complex phenotypes driven by multifaceted molecular interactions. Recent biomedical research has emphasized the comprehensive study of such diseases by integrating multi-omics datasets (genome, proteome, transcriptome, epigenome). This approach provides an efficient method for identifying genetic variants associated with cancer and offers a deeper understanding of how the disease develops and spreads. However, it is challenging to comprehend complex interactions among the features of multi-omics datasets compared to single omics. In this paper, we analyze lung cancer multi-omics datasets from The Cancer Genome Atlas (TCGA). Using four statistical methods, LIMMA, the T test, Canonical Correlation Analysis (CCA), and the Wilcoxon test, we identified differentially expressed genes across gene expression, DNA methylation, and miRNA expression data. We then integrated these multi-omics data using the Kernel Machine Regression (KMR) approach. Our findings reveal significant interactions among the three omics: gene expression, miRNA expression, and DNA methylation in lung cancer. From our data analysis, we identified 38 genes significantly associated with lung cancer. From our data analysis, we identified 38 genes significantly associated with lung cancer. Among these, eight genes of highest ranking (PDGFRB, PDGFRA, SNAI1, ID1, FGF11, TNXB, ITGB1, ZIC1) were highlighted by rigorous statistical analysis. Furthermore, in silico studies identified three top-ranked potential candidate drugs (Selinexor, Orapred, and Capmatinib) that could play a crucial role in the treatment of lung cancer. These proposed drugs are also supported by the findings of other independent studies, which underscore their potential efficacy in the fight against lung cancer.

artificial intelligence, interaction, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2510.16093

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Assessing Social and Intersectional Biases in Contextualized Word Representations

Yi Chern Tan, L. Elisa Celis

Neural Information Processing SystemsOct-2-2025, 08:45:57 GMT

Neural Information Processing Systems http://nips.cc/

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.15)
North America > United States > Louisiana (0.14)

Genre: Research Report > New Finding (0.95)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

Functional Analysis of Variance for Association Studies

Vsevolozhskaya, Olga A., Zaykin, Dmitri V., Greenwood, Mark C., Wei, Changshuai, Lu, Qing

arXiv.org Artificial IntelligenceAug-18-2025

While progress has been made in identifying common genetic variants associated with human diseases, for most of common complex diseases, the identified genetic variants only account for a small proportion of heritability. Challenges remain in finding additional unknown genetic variants predisposing to complex diseases. With the advance in next-generation sequencing technologies, sequencing studies have become commonplace in genetic research. The ongoing exome-sequencing and whole-genome-sequencing studies generate a massive amount of sequencing variants and allow researchers to comprehensively investigate their role in human diseases. The discovery of new disease-associated variants can be enhanced by utilizing powerful and computationally efficient statistical methods. In this paper, we propose a functional analysis of variance (FANOVA) method for testing an association of sequence variants in a genomic region with a qualitative trait. The FANOVA has a number of advantages: (1) it tests for a joint effect of gene variants, including both common and rare; (2) it fully utilizes linkage disequilibrium and genetic position information; and (3) allows for either protective or risk-increasing causal variants. Through simulations, we show that FANOVA outperform two popularly used methods - SKAT and a previously proposed method based on functional linear models (FLM), - especially if a sample size of a study is small and/or sequence variants have low to moderate effects. We conduct an empirical study by applying three methods (FANOVA, SKAT and FLM) to sequencing data from Dallas Heart Study. While SKAT and FLM respectively detected ANGPTL 4 and ANGPTL 3 associated with obesity, FANOVA was able to identify both genes associated with obesity.

artificial intelligence, machine learning, variant, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1371/journal.pone.0105074

2508.11069

Country: North America > United States > Michigan (0.28)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.94)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

A Generalized Similarity U Test for Multivariate Analysis of Sequencing Data

Wei, Changshuai, Lu, Qing

arXiv.org Artificial IntelligenceAug-18-2025

Summary: Sequencing-based studies are emerging as a major tool for genetic association studies of complex diseases. These studies pose great challenges to the traditional statistical methods because of the high-dimensionality of data and the low frequency of genetic variants. Moreover, there is a great interest in biology and epidemiology to identify genetic risk factors contributed to multiple disease phenotypes. The multiple phenotypes can often follow different distributions, which brings an additional challenge to the current statistical framework. In this paper, we propose a generalized similarity U test, referred to as GSU. GSU is a similarity-based test that can handle high-dimensional genotypes and phenotypes. We studied the theoretical properties of GSU, and provided the efficient p-value calculation for association test as well as the sample size and power calculation for the study design. Through simulation, we found that GSU had advantages over existing methods in terms of power and robustness to phenotype distributions. Finally, we used GSU to perform a multivariate analysis of sequencing data in the Dallas Heart Study and identified a joint association of 4 genes with 5 metabolic related phenotypes. Key words: Weighted U Statistic; Sequencing Study; Non-parametric Statistics. 1. Introduction Genome-wide association studies (GW AS) have made substantial progress in discovering common genetic variants associated with complex diseases. Despite such success, a large proportion of heritability of complex diseases remains unexplained.

artificial intelligence, machine learning, phenotype, (19 more...)

arXiv.org Artificial Intelligence

1505.01179

Country:

North America > United States > Texas (0.28)
North America > United States > Michigan (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Reviews: Assessing Social and Intersectional Biases in Contextualized Word Representations

Neural Information Processing SystemsJan-22-2025, 05:42:42 GMT

I look forward to the final version including more details about the tests, as requested by reviewer 2.] This paper studies the presence of social biases in contextualized word representations. First, word co-occurnce statistics of pronouns and stereotypical occupations are provided for various datasets used for training contextualizers. Then, the word/sentence embedding association test is extended for the contextual case. Using templates, instead of aggregating over word representations (in sentence test) or taking the context-free word embedding (in word test), the contextual word representation is used. Then, an association test compares the association between a concept and an attribute using a permutation test.

contextualized word representation, representation, word representation, (9 more...)

Neural Information Processing Systems

Genre: Research Report (0.54)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)

Add feedback

Investigating Gender Bias in Turkish Language Models

Caglidil, Orhun, Ostendorff, Malte, Rehm, Georg

arXiv.org Artificial IntelligenceApr-17-2024

Language models are trained mostly on Web data, which often contains social stereotypes and biases that the models can inherit. This has potentially negative consequences, as models can amplify these biases in downstream tasks or applications. However, prior research has primarily focused on the English language, especially in the context of gender bias. In particular, grammatically gender-neutral languages such as Turkish are underexplored despite representing different linguistic properties to language models with possibly different effects on biases. In this paper, we fill this research gap and investigate the significance of gender bias in Turkish language models. We build upon existing bias evaluation frameworks and extend them to the Turkish language by translating existing English tests and creating new ones designed to measure gender bias in the context of T\"urkiye. Specifically, we also evaluate Turkish language models for their embedded ethnic bias toward Kurdish people. Based on the experimental results, we attribute possible biases to different model characteristics such as the model size, their multilingualism, and the training corpora. We make the Turkish gender bias dataset publicly available.

gender bia, sentence-level test, significance, (17 more...)

arXiv.org Artificial Intelligence

2404.11726

Country:

Asia > Middle East > Republic of Türkiye (0.14)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.70)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

A Unified Combination Framework for Dependent Tests with Applications to Microbiome Association Studies

Yu, Xiufan, Zhang, Linjun, Srinivasan, Arun, Xie, Min-ge, Xue, Lingzhou

arXiv.org Machine LearningApr-14-2024

We introduce a novel meta-analysis framework to combine dependent tests under a general setting, and utilize it to synthesize various microbiome association tests that are calculated from the same dataset. Our development builds upon the classical meta-analysis methods of aggregating $p$-values and also a more recent general method of combining confidence distributions, but makes generalizations to handle dependent tests. The proposed framework ensures rigorous statistical guarantees, and we provide a comprehensive study and compare it with various existing dependent combination methods. Notably, we demonstrate that the widely used Cauchy combination method for dependent tests, referred to as the vanilla Cauchy combination in this article, can be viewed as a special case within our framework. Moreover, the proposed framework provides a way to address the problem when the distributional assumptions underlying the vanilla Cauchy combination are violated. Our numerical results demonstrate that ignoring the dependence among the to-be-combined components may lead to a severe size distortion phenomenon. Compared to the existing $p$-value combination methods, including the vanilla Cauchy combination method, the proposed combination framework can handle the dependence accurately and utilizes the information efficiently to construct tests with accurate size and enhanced power. The development is applied to Microbiome Association Studies, where we aggregate information from multiple existing tests using the same dataset. The combined tests harness the strengths of each individual test across a wide range of alternative spaces, %resulting in a significant enhancement of testing power across a wide range of alternative spaces, enabling more efficient and meaningful discoveries of vital microbiome associations.

cauchy combination, combination method, dependence, (16 more...)

arXiv.org Machine Learning

2404.09353

Country: North America > United States > Wisconsin (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.46)

Add feedback

Effect of dimensionality change on the bias of word embeddings

Rai, Rohit Raj, Awekar, Amit

arXiv.org Artificial IntelligenceDec-28-2023

Word embedding methods (WEMs) are extensively used for representing text data. The dimensionality of these embeddings varies across various tasks and implementations. The effect of dimensionality change on the accuracy of the downstream task is a well-explored question. However, how the dimensionality change affects the bias of word embeddings needs to be investigated. Using the English Wikipedia corpus, we study this effect for two static (Word2Vec and fastText) and two context-sensitive (ElMo and BERT) WEMs. We have two observations. First, there is a significant variation in the bias of word embeddings with the dimensionality change. Second, there is no uniformity in how the dimensionality change affects the bias of word embeddings. These factors should be considered while selecting the dimensionality of word embeddings.

dimensionality, dimensionality change, fasttext, (16 more...)

arXiv.org Artificial Intelligence

2312.17292

Country:

Asia > India > West Bengal > Kolkata (0.06)
Asia > India > Assam > Guwahati (0.05)
Oceania > Australia > Victoria > Melbourne (0.05)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Software (0.95)

Add feedback

A Kernel-Based Neural Network Test for High-dimensional Sequencing Data Analysis

Hou, Tingting, Jiang, Chang, Lu, Qing

arXiv.org Machine LearningDec-5-2023

The recent development of artificial intelligence (AI) technology, especially the advance of deep neural network (DNN) technology, has revolutionized many fields. While DNN plays a central role in modern AI technology, it has been rarely used in sequencing data analysis due to challenges brought by high-dimensional sequencing data (e.g., overfitting). Moreover, due to the complexity of neural networks and their unknown limiting distributions, building association tests on neural networks for genetic association analysis remains a great challenge. To address these challenges and fill the important gap of using AI in high-dimensional sequencing data analysis, we introduce a new kernel-based neural network (KNN) test for complex association analysis of sequencing data. The test is built on our previously developed KNN framework, which uses random effects to model the overall effects of high-dimensional genetic data and adopts kernel-based neural network structures to model complex genotype-phenotype relationships. Based on KNN, a Wald-type test is then introduced to evaluate the joint association of high-dimensional genetic data with a disease phenotype of interest, considering non-linear and non-additive effects (e.g., interaction effects). Through simulations, we demonstrated that our proposed method attained higher power compared to the sequence kernel association test (SKAT), especially in the presence of non-linear and interaction effects. Finally, we apply the methods to the whole genome sequencing (WGS) dataset from the Alzheimer's Disease Neuroimaging Initiative (ADNI) study, investigating new genes associated with the hippocampal volume change over time.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Machine Learning

2312.0285

Country:

North America > United States > Michigan (0.04)
North America > Canada (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.55)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback